Abstract: The increasing use of internet leads to handle lots of data by internet service providers. MapReduce is one of the goodsolutions for implementing large scale distributed data application. AMapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reducetasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general executionconstraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for aMapReduce workload have significantly different performance and system utilization. Makespanand total completion timeare two key performancemetrics T his paper proposes two algorithm for these two key. Our first class of algorithms focuses onthe job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. Our second class ofalgorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload.
Keywords: MapReduce, Hadoop, Flow-shops, Scheduling algorithm, Job ordering.